11 research outputs found

    Exhaustive Symbolic Regression

    Full text link
    Symbolic Regression (SR) algorithms learn analytic expressions which both accurately fit data and, unlike traditional machine-learning approaches, are highly interpretable. Conventional SR suffers from two fundamental issues which we address in this work. First, since the number of possible equations grows exponentially with complexity, typical SR methods search the space stochastically and hence do not necessarily find the best function. In many cases, the target problems of SR are sufficiently simple that a brute-force approach is not only feasible, but desirable. Second, the criteria used to select the equation which optimally balances accuracy with simplicity have been variable and poorly motivated. To address these issues we introduce a new method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically and efficiently considers all possible equations and is therefore guaranteed to find not only the true optimum but also a complete function ranking. Utilising the minimum description length principle, we introduce a principled method for combining these preferences into a single objective statistic. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding \sim40 functions (out of 5.2 million considered) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not necessarily prefer a Λ\LambdaCDM expansion history, and traditional SR algorithms that return only the Pareto-front, even if they found this successfully, would not locate Λ\LambdaCDM. We make our code and full equation sets publicly available.Comment: 14 pages, 6 figures, 2 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Priors for symbolic regression

    Full text link
    When choosing between competing symbolic models for a data set, a human will naturally prefer the "simpler" expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a nn-gram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.Comment: 8+2 pages, 2 figures. Submitted to The Genetic and Evolutionary Computation Conference (GECCO) 2023 Workshop on Symbolic Regressio

    The Simplest Inflationary Potentials

    Full text link
    Inflation is a highly favoured theory for the early Universe. It is compatible with current observations of the cosmic microwave background and large scale structure and is a driver in the quest to detect primordial gravitational waves. It is also, given the current quality of the data, highly under-determined with a large number of candidate implementations. We use a new method in symbolic regression to generate all possible simple scalar field potentials for one of two possible basis sets of operators. Treating these as single-field, slow-roll inflationary models we then score them with an information-theoretic metric ("minimum description length") that quantifies their efficiency in compressing the information in the Planck data. We explore two possible priors on the parameter space of potentials, one related to the functions' structural complexity and one that uses a Katz back-off language model to prefer functions that may be theoretically motivated. This enables us to identify the inflaton potentials that optimally balance simplicity with accuracy at explaining the Planck data, which may subsequently find theoretical motivation. Our exploratory study opens the door to extraction of fundamental physics directly from data, and may be augmented with more refined theoretical priors in the quest for a complete understanding of the early Universe.Comment: 13+4 pages, 4 figures; submitted to Physical Review

    Modeling and testing screening mechanisms in the laboratory and in space

    No full text
    International audienceThe non-linear dynamics of scalar fields coupled to matter and gravity can lead to remarkable density-dependent screening effects. In this short review we present the main classes of screening mechanisms, and discuss their tests in laboratory and astrophysical systems. We particularly focus on reviewing numerical and technical aspects involved in modeling the non-linear dynamics of screening. In this review, we focus on tests using laboratory experiments and astrophysical systems, such as stars, galaxies and dark matter halos

    No evidence for p- or d-wave dark matter annihilation from local large-scale structure

    No full text
    International audienceIf dark matter annihilates into standard model particles with a cross-section which is velocity dependent, then Local Group dwarf galaxies will not be the best place to search for the resulting gamma ray emission. A greater flux would be produced by more distant and massive halos, with larger velocity dispersions. We construct full-sky predictions for the gamma-ray emission from galaxy- and cluster-mass halos within 200Mpc\sim 200 \, {\mathrm{Mpc}} using a suite of constrained NN-body simulations (CSiBORG) based on the Bayesian Origin Reconstruction from Galaxies algorithm. Comparing to observations from the Fermi Large Area Telescope and marginalising over reconstruction uncertainties and other astrophysical contributions to the flux, we obtain constraints on the cross-section which are two (seven) orders of magnitude tighter than those obtained from dwarf spheroidals for pp-wave (dd-wave) annihilation. We find no evidence for either type of annihilation from dark matter particles with masses in the range mχ=2500GeV/c2m_\chi = 2-500 \, {\mathrm{GeV}}/c^2, for any channel. As an example, for annihilations producing bottom quarks with mχ=10GeV/c2m_\chi = 10 \, {\mathrm{GeV}}/c^2, we find a1<2.4×1021cm3s1a_{1} < 2.4 \times 10^{-21} \, {\mathrm{cm^3 s^{-1}}} and a2<3.0×1018cm3s1a_{2} < 3.0 \times 10^{-18} \, {\mathrm{cm^3 s^{-1}}} at 95% confidence, where the product of the cross-section, σ\sigma, and relative particle velocity, vv, is given by σv=a(v/c)2\sigma v = a_\ell (v/c)^{2\ell} and =1,2\ell=1, 2 for pp-, dd-wave annihilation, respectively. Our bounds, although failing to exclude the thermal relic cross-section for velocity-dependent annihilation channels, are among the tightest to date

    Exhaustive Symbolic Regression

    No full text
    Symbolic Regression (SR) algorithms learn analytic expressions which both accurately fit data and, unlike traditional machine-learning approaches, are highly interpretable. Conventional SR suffers from two fundamental issues which we address in this work. First, since the number of possible equations grows exponentially with complexity, typical SR methods search the space stochastically and hence do not necessarily find the best function. In many cases, the target problems of SR are sufficiently simple that a brute-force approach is not only feasible, but desirable. Second, the criteria used to select the equation which optimally balances accuracy with simplicity have been variable and poorly motivated. To address these issues we introduce a new method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically and efficiently considers all possible equations and is therefore guaranteed to find not only the true optimum but also a complete function ranking. Utilising the minimum description length principle, we introduce a principled method for combining these preferences into a single objective statistic. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding \sim40 functions (out of 5.2 million considered) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not necessarily prefer a Λ\LambdaCDM expansion history, and traditional SR algorithms that return only the Pareto-front, even if they found this successfully, would not locate Λ\LambdaCDM. We make our code and full equation sets publicly available

    On the functional form of the radial acceleration relation

    No full text
    International audienceWe apply a new method for learning equations from data -- Exhaustive Symbolic Regression (ESR) -- to late-type galaxy dynamics as encapsulated in the radial acceleration relation (RAR). Relating the centripetal acceleration due to baryons, gbarg_\text{bar}, to the total dynamical acceleration, gobsg_\text{obs}, the RAR has been claimed to manifest a new law of nature due to its regularity and tightness, in agreement with Modified Newtonian Dynamics (MOND). Fits to this relation have been restricted by prior expectations to particular functional forms, while ESR affords an exhaustive and nearly prior-free search through functional parameter space to identify the equations optimally trading accuracy with simplicity. Working with the SPARC data, we find the best functions typically satisfy gobsgbarg_\text{obs} \propto g_\text{bar} at high gbarg_\text{bar}, although the coefficient of proportionality is not clearly unity and the deep-MOND limit gobsgbarg_\text{obs} \propto \sqrt{g_\text{bar}} as gbar0g_\text{bar} \to 0 is little evident at all. By generating mock data according to MOND with or without the external field effect, we find that symbolic regression would not be expected to identify the generating function or reconstruct successfully the asymptotic slopes. We conclude that the limited dynamical range and significant uncertainties of the SPARC RAR preclude a definitive statement of its functional form, and hence that this data alone can neither demonstrate nor rule out law-like gravitational behaviour

    The scatter in the galaxy-halo connection: a machine learning analysis

    Full text link
    We apply machine learning, a powerful method for uncovering complex correlations in high-dimensional data, to the galaxy-halo connection of cosmological hydrodynamical simulations. The mapping between galaxy and halo variables is stochastic in the absence of perfect information, but conventional machine learning models are deterministic and hence cannot capture its intrinsic scatter. To overcome this limitation, we design an ensemble of neural networks with a Gaussian loss function that predict probability distributions, allowing us to model statistical uncertainties in the galaxy-halo connection as well as its best-fit trends. We extract a number of galaxy and halo variables from the Horizon-AGN and IllustrisTNG100-1 simulations and quantify the extent to which knowledge of some subset of one enables prediction of the other. This allows us to identify the key features of the galaxy-halo connection and investigate the origin of its scatter in various projections. We find that while halo properties beyond mass account for up to 50 per cent of the scatter in the halo-to-stellar mass relation, the prediction of stellar half-mass radius or total gas mass is not substantially improved by adding further halo properties. We also use these results to investigate semi-analytic models for galaxy size in the two simulations, finding that assumptions relating galaxy size to halo size or spin are not successful.Comment: 20 pages, 11 figures. Accepted in MNRA

    Constraints on dark matter annihilation and decay from the large-scale structure of the nearby universe

    Full text link
    Decaying or annihilating dark matter particles could be detected through gamma-ray emission from the species they decay or annihilate into. This is usually done by modelling the flux from specific dark matter-rich objects such as the Milky Way halo, Local Group dwarfs and nearby groups. However, these objects are expected to have significant emission from baryonic processes as well, and the analyses discard gamma-ray data over most of the sky. Here we construct full-sky templates for gamma-ray flux from the large-scale structure within \sim200 Mpc by means of a suite of constrained NN-body simulations (CSiBORG) produced using the Bayesian Origin Reconstruction from Galaxies algorithm. Marginalising over uncertainties in this reconstruction, small-scale structure and parameters describing astrophysical contributions to the observed gamma ray sky, we compare to observations from the Fermi Large Area Telescope to constrain dark matter annihilation cross-sections and decay rates through a Markov Chain Monte Carlo analysis. We rule out the thermal relic cross-section for ss-wave annihilation for all mχ7GeV/c2m_\chi \lesssim 7 {\rm \, GeV}/c^2 at 95% confidence if the annihilation produces ZZ bosons, gluons or quarks less massive than the bottom quark. We infer a contribution to the gamma ray sky with the same spatial distribution as dark matter decay at 3.3σ3.3\sigma. Although this could be due to dark matter decay via these channels with a decay rate Γ3×1028s1\Gamma \approx 3 \times 10^{-28} {\rm \, s^{-1}}, we find that a power-law spectrum of index p=2.750.46+0.71p=-2.75^{+0.71}_{-0.46}, likely of baryonic origin, is preferred by the data.Comment: 23 pages, 9 figures, 1 table. Submitted to Physical Review
    corecore